research-article

Direct posterior confidence for out-of-vocabulary spoken term detection

Authors:
Dong Wang

EURECOM, Sophia Antipolis, France

EURECOM, Sophia Antipolis, France
View Profile

,
Simon King

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Nicholas Evans

EURECOM, Sophia Antipolis, France

EURECOM, Sophia Antipolis, France
View Profile

,
Joe Frankel

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Raphaél Troncy

EURECOM, Sophia Antipolis, France

EURECOM, Sophia Antipolis, France
View Profile

SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speechOctober 2010Pages 21–26https://doi.org/10.1145/1878101.1878107

Published:29 October 2010Publication History

SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech

Pages 21–26

ABSTRACT

Spoken term detection (STD) is a fundamental task in spoken information retrieval. Compared to conventional speech transcription and keyword spotting, STD is an open-vocabul-ary task and is necessarily required to address out-of-vocabulary (OOV) terms. Approaches based on subword units, e.g. phonemes, are widely used to solve the OOV issue; however, performance on OOV terms is still significantly inferior to that for in-vocabulary (INV) terms.

The performance degradation on OOV terms can be attributed to a multitude of factors. A particular factor we address in this paper is that the acoustic and language models used for speech transcribing are highly vulnerable to OOV terms, which leads to unreliable confidence measures and error-prone detections.

A direct posterior confidence measure that is derived from discriminative models has been proposed for STD. In this paper, we utilize this technique to tackle the weakness of OOV terms in confidence estimation. Neither acoustic models nor language models being included in the computation, the new confidence avoids the weak modeling problem with OOV terms. Our experiments, set up on multi-party meeting speech which is highly spontaneous and conversational, demonstrate that the proposed technique improves STD performance on OOV terms significantly; when combined with conventional lattice-based confidence, a significant improvement in performance is obtained on both INVs and OOVs. Furthermore, the new confidence measure technique can be combined together with other advanced techniques for OOV treatment, such as stochastic pronunciation modeling and term-dependent confidence discrimination, which leads to an integrated solution for OOV STD with greatly improved performance.

References

M. Akbacak, D. Vergyri, and A. Stolcke. "Open-vocabulary spoken term detection using graphone-based hybrid recognition systems". In Proc. ICASSP'08, pages 5240--5243, Las Vegas, Nevada, USA, March 2008.Google ScholarCross Ref
D. Can, E. Cooper, A. Sethy, C. White, B. Ramabhadran, and M. Saraclar. "Effect of pronunciations on OOV queries in spoken term detection". In Proc. ICASSP'09, pages 3957--3960, Taipei, Taiwan, April 2009. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. "LIBSVM: A library for support vector machines", 2001.Google Scholar
S. Deligne, F. Yvon, and F. Bimbot. "Variable-length sequence matching for phonetic transcription using joint multigrams". In Proc. Eurospeech'95, pages 2243--2246, Madrid, Spain, September 1995.Google Scholar
T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. "The AMI meeting transcription system: Progress and performance". In Machine Learning for Multimodal Interaction, volume 4299/2006, pages 419--431. Springer Berlin/Heidelberg, 2006. Google ScholarDigital Library
H. Hermansky, D. P. Ellis, and S. Sharma. "Tandem connectionist feature extraction for conventional HMM systems". In Proc. ICASSP'00, pages 1635--1638, Istanbul, Turkey, June 2000.Google ScholarCross Ref
J. Mamou and B. Ramabhadran. "Phonetic query expansion for spoken document retrieval". In Proc. Interspeech'08, pages 2106--2109, Brisbane, Australia, September 2008.Google Scholar
NIST. "The spoken term detection (STD) 2006 evaluation plan". National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA, 10 edition, September 2006.Google Scholar
I. Szoke, M. Fapso, L. Burget, and J. Cernock "Hybrid word-subword decoding for spoken term detection". In Proc. Speech search workshop at SIGIR (SSCS'08), Singapore, 2008. Association for Computing Machinery.Google Scholar
D. Vergyri, I. Shafran, A. Stolcke, R. R. Gadde, M. Akbacak, B. Roark, and W. Wang. "The SRI/OGI 2006 spoken term detection system". In Proc. Interspeech'07, pages 2393--2396, Antwerp, Belgium, August 2007.Google Scholar
D. Wang, S. King, and J. Frankel. "Stochastic pronunciation modelling for spoken term detection". In Proc. Interspeech'09, pages 2135--2138, Brighton, UK, September 2009.Google Scholar
D. Wang, S. King, J. Frankel, and P. Bell. "Term-dependent confidence for out-of-vocabulary term detection". In Proc. Interspeech'09, pages 2139--2142, Brighton, UK, September 2009.Google Scholar
D. Wang, J. Tejedor, J. Frankel, and S. King. "Posterior-based confidence measures for spoken term detection". In Proc. ICASSP'09, pages 4889--4892, Taiwan, April 2009. Google ScholarDigital Library

Index Terms

Direct posterior confidence for out-of-vocabulary spoken term detection
1. Information systems
  1. Information retrieval

Recommendations

Direct posterior confidence for out-of-vocabulary spoken term detection

Spoken term detection (STD) is a key technology for spoken information retrieval. As compared to the conventional speech transcription and keyword spotting, STD is an open-vocabulary task and has to address out-of-vocabulary (OOV) terms. Approaches ...
Read More
Vocabulary independent spoken term detection
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts ...
Read More
An approach for efficient open vocabulary spoken term detection

A hybrid two-pass approach for facilitating fast and efficient open vocabulary spoken term detection (STD) is presented in this paper. A large vocabulary continuous speech recognition (LVCSR) system is deployed for producing word lattices from audio ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech
October 2010
72 pages
ISBN:9781450301626
DOI:10.1145/1878101
Program Chairs:
Martha Larson
Delft University of Technology, Netherlands
,
Roeland Ordelman
Netherlands Institute for Sound & Vision and University of Twente, Netherlands
,
Florian Metze
Carnegie Mellon University, USA
,
Franciska de Jong
University of Twente, Netherlands
,
Wessel Kraaij
TNO and Radboud University, Netherlands
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
speech document search
speech recognition
spoken term detection
spontaneous conversational speech
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 69
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Direct posterior confidence for out-of-vocabulary spoken term detection

SSCS '10: Proceedings of the 2010 international workshop on Searching spontaneous conversational speech

ABSTRACT

References

Cited By

Index Terms

Recommendations

Direct posterior confidence for out-of-vocabulary spoken term detection

Vocabulary independent spoken term detection

An approach for efficient open vocabulary spoken term detection